Systems and methods for constructing a three-dimensional model from two-dimensional images

ABSTRACT

Systems and methods for generating a three-dimensional (3D) model of a user&#39;s dental arch based on two-dimensional (2D) images include a model generation engine that receives one or more images of a dental arch of a user. The model generation engine generates a point cloud based on the images of the dental arch of the user. The model generation engine generates a 3D model based on the point cloud.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.16/696,468, filed Nov. 26, 2019, the contents of which are hereinincorporated by reference in its entirety.

BACKGROUND

The present disclosure relates generally to constructingthree-dimensional models for use in manufacturing dental appliances.More specifically, the present disclosure relates to constructingthree-dimensional models of a user's dental arch from two-dimensionalimages of the user's dental arch to manufacture dental aligners.

Dental aligners for repositioning a user's teeth may be manufactured forthe user based on a 3D model of the user's teeth. The 3D model can begenerated from a dental impression or an intraoral scan of the user'steeth. Dental impressions for generating such a 3D model can be taken bya user or an orthodontic professional using a dental impression kit. Anintraoral scan of the user's mouth can be taken using 3D scanningequipment. However, these methodologies for obtaining informationnecessary to generate a 3D model of the user's teeth can be timeconsuming, prone to errors made by the user or orthodontic professional,and require specialized equipment.

SUMMARY

At least one embodiment relates to a method. The method includesreceiving, by a model generation system, one or more images of a dentalarch of a user. The method includes generating, by the model generationsystem, a point cloud based on data from the one or more images of thedental arch of the user. The method includes generating, by the modelgeneration system, a three-dimensional (3D) model of the dental arch ofthe user based on the point cloud. The method includes manufacturing,based on the 3D model of the dental arch of the user, a dental alignerspecific to the user and configured to reposition one or more teeth ofthe user.

Another embodiment relates to a method. The method includes generating,by an image detector from one or more images of a dental arch of a user,an image feature map including a classification of a plurality ofportions of the one or more images. Each classification corresponds to afeature within the respective portion of the one or more images. Themethod includes generating, by a model generation engine, a point cloudusing the one or more images. Generating the point cloud includescomputing, by an encoder, a probability for of each feature of the imagefeature map using one or more weights. Generating the point cloudincludes generating, by an output engine, a point cloud for the imagefeature map using the probabilities. Generating the point cloud includescomputing, by a decoder, a loss function based on a difference betweenfeatures from the point cloud and corresponding probabilities offeatures of the image feature map. Generating the point cloud includestraining, by the encoder, the one or more weights for computing theprobability based on the computed loss function. The method includesgenerating, by the model generation engine based on the point cloud, athree-dimensional (3D) model of the dental arch of the user, the 3Dmodel corresponding to the one or more images.

Another embodiment relates to a system. The system includes a processingcircuit comprising a processor communicably coupled to a non-transitorycomputer readable medium. The processor is configured to executeinstructions stored on the non-transitory computer readable medium toreceive one or more images of a dental arch of a user. The processor isfurther configured to execute instructions to generate a point cloudbased on data from the one or more images of the dental arch of theuser. The processor is further configured to execute instructions togenerate a three-dimensional (3D) model of the dental arch of the userbased on the point cloud. The processor is further configured to executeinstructions to transmit the 3D model of the dental arch of the user toa manufacturing system for manufacturing a dental aligner based on the3D model. The dental aligner is specific to the user and configured toreposition one or more teeth of the user.

This summary is illustrative only and is not intended to be in any waylimiting. Other aspects, inventive features, and advantages of thedevices or processes described herein will become apparent in thedetailed description set forth herein, taken in conjunction with theaccompanying figures, wherein like reference numerals refer to likeelements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system for generating a three-dimensional(3D) model from one or more two-dimensional (2D) images, according to anillustrative embodiment.

FIG. 2A is an illustration of a first example image of a patient'smouth, according to an illustrative embodiment.

FIG. 2B is an illustration of a second example image of a patient'smouth, according to an illustrative embodiment.

FIG. 2C is an illustration of a third example image of a patient'smouth, according to an illustrative embodiment.

FIG. 3 is a block diagram of an image feature map generated by thesystem of FIG. 1, according to an illustrative embodiment.

FIG. 4 is a block diagram of a neural network which may be implementedwithin one or more of the components of FIG. 1, according to anillustrative embodiment.

FIG. 5A is an illustration of an example point cloud overlaid on adigital model of upper dental arch, according to an illustrativeembodiment.

FIG. 5B is an illustration of an example point cloud overlaid on adigital model of a lower dental arch, according to an illustrativeembodiment.

FIG. 5C is an illustration of a point cloud including the point cloudsshown in FIG. 5A and FIG. 5B, according to an illustrative embodiment.

FIG. 6 is a diagram of a method of generating a 3D model from one ormore 2D images, according to an illustrative embodiment.

FIG. 7 is a diagram of a method of generating a point cloud from one ormore 2D images, according to an illustrative embodiment.

DETAILED DESCRIPTION

Before turning to the figures, which illustrate certain exemplaryembodiments in detail, it should be understood that the presentdisclosure is not limited to the details or methodology set forth in thedescription or illustrated in the figures. It should also be understoodthat the terminology used herein is for the purpose of description onlyand should not be regarded as limiting.

Referring generally to the figures, described herein are systems andmethods for generating a three-dimensional (3D) model of a user's dentalarch from two-dimensional (2D) images. A model generation systemreceives images of the user's dental arch, generates a point cloud usingthe images of the user's dental arch, and manufactures dental aligner(s)based on the point cloud. The systems and methods described herein havemany advantages over other implementations. For instance, the systemsand methods described herein expedite the manufacturing and delivery ofdental aligners to a user by more efficiently generating 3D models ofthe user's dentition without requiring the user to administer a dentalimpression kit, conduct a scan of their dentition, or attend anappointment with a dentist or orthodontist. By not requiring anappointment with a dentist or orthodontist, such systems and methods maymake users more comfortable and confident with receiving orthodontictreatment, and avoid delays in receiving orthodontic treatment due toneeding to retake dental impressions or a scan of the user's teeth. Ifan additional 2D image of the user's dentition is needed, such imagescan easily be acquired by taking an additional photograph of the user'sdentition, whereas a user undergoing a more traditional orthodontictreatment would be required to obtain an impression kit or visit adentist or orthodontist to have an additional scan of their dentitionconducted. Instead of requiring the user to administer dentalimpressions or visit an intraoral scanning site for receiving anintraoral scan of the user's dentition, the systems and methodsdescribed herein leverage images captured by the user to manufacturedental aligners. As another example, the systems and methods describedherein may be used to manufacture dental aligners by supplementing dataregarding the user's dentition, for example, acquired by an intraoralscan, or a dental impression administered by the user.

Referring now to FIG. 1, a system 100 for generating a three dimensional(3D) model is shown according to an illustrative embodiment. The system100 (also referred to herein as a model generation system 100) is shownto include a pre-trained image detector 102 and a model generationengine 104. As described in greater detail below, the pre-trained imagedetector 102 is configured to generate an image feature map from one ormore images 106 of a mouth of a user. The model generation engine 104 isconfigured to generate a 3D model using the one or more images 106. Themodel generation engine 104 includes a long short-term memory (LSTM)encoder 108 configured to compute a probability of each feature of theimage feature map using one or more weights. The model generation engine104 includes an output engine 110 configured to generate a point cloudusing data from the LSTM encoder 108. The model generation engine 104includes a point cloud feature extractor 112 configured to determinefeatures from the point cloud generated by the output engine 110. Themodel generation engine 104 includes an LSTM decoder 114 configured todetermine a difference between features from the point cloud andcorresponding probabilities of features of the image feature map. TheLSTM encoder 108 trains the one or more weights for computing theprobability based on the difference determined by the LSTM decoder 114.The model generation engine 104 iteratively cycles between the LSTMencoder 108, output engine 110, point cloud feature extractor 112, andLSTM decoder 114 to generate and refine point clouds corresponding tothe images 106. At the final iteration, an output engine 110 isconfigured to generate the 3D model using the final iteration of thepoint cloud.

The model generation system 100 is shown to include a pre-trained imagedetector 102. The pre-trained image detector 102 may be any device(s),component(s), application(s), element(s), script(s), circuit(s), orother combination of software and/or hardware designed or implemented togenerate an image feature map from one or more images 106. Thepre-trained image detector 102 may be embodied on a server or computingdevice, embodied on a mobile device communicably coupled to a server,and so forth. In some implementations, the pre-trained image detector102 may be embodied on a server which is designed or implemented togenerate a 3D model using two dimensional (2D) images. The server may becommunicably coupled to a mobile device (e.g., via various networkconnections).

Referring now to FIG. 1 and FIG. 2A-FIG. 2C, the pre-trained imagedetector 102 may be configured to receive one or more images 106 of amouth of a user, such as one or more 2D images. Specifically, FIG.2A-FIG. 2C are illustrations of example images 106 of a user's mouth.The user may capture a first image 106 of a straight on, closed view ofthe user's mouth by aiming a camera in a straight-on mannerperpendicular to the labial surface of the teeth (shown in FIG. 2A), asecond image 106 of a lower, open view of the user's mouth by aiming acamera from an upper angle down toward the lower teeth (shown in FIG.2B), and a third image 106 of an upper, open view of the user's mouth byaiming a camera from a lower angle up toward the upper teeth (shown inFIG. 2C). The user may capture images 106 with a dental appliance 200positioned at least partially within the user's mouth. The dentalappliance 200 is configured to hold open the user's lips to expose theuser's teeth and gingiva. The user may capture various image(s) 106 ofthe user's mouth (e.g., with the dental appliance 200 positionedtherein). In some embodiments, the user takes two images of their teethfrom substantially the same viewpoint (e.g., both from a straight-onviewpoint), or from substantially the same viewpoint but offsetslightly. After capturing the images 106, the user may upload the images106 to the pre-trained image detector 102 (e.g., to a website orinternet-based portal associated with the pre-trained image detector 102or model generation system 100, by emailing or sending a message of theimages 106 to an email address or phone number or other accountassociated with the pre-trained image detector 102, and so forth).

The pre-trained image detector 102 is configured to receive the images106 from the mobile device of the user. The pre-trained image detector102 may receive the images 106 directly from the mobile device (e.g., bythe mobile device transmitting the images 106 via a network connectionto a server which hosts the pre-trained image detector 102). Thepre-trained image detector 102 may retrieve the images 106 from astorage device (e.g., where the mobile device stored the images 106 onthe storage device, such as a database or a cloud storage system). Insome embodiments, the pre-trained image detector 102 is configured toscore the images 106. The pre-trained image detector 102 may generate ametric which identifies the overall quality of the image. Thepre-trained image detector 102 may include a Blind/Referenceless ImageSpatial Quality Evaluator (BRISQUE). The BRISQUE is configured togenerate an image score between a range (e.g., between 0-100, forinstance, with lower scores being generated for images having higherquality). The BRISQUE may be configured to generate the image scorebased on, for example, the measured pixel noise, image distortion, andso forth, to objectively evaluate the image quality. Where the imagescore does not satisfy a threshold, the pre-trained image detector 102may be configured to generate a prompt for the user which directs theuser to re-take one or more of the images 106.

Referring now to FIG. 1 and FIG. 3, the pre-trained image detector 102is configured to process the image(s) 106 to generate an image featuremap 300. Specifically, FIG. 3 is a block diagram of an image feature map300 corresponding to one of the image(s) 106 received by the pre-trainedimage detector 102. The pre-trained image detector 102 may be configuredto process images 106 received from the mobile device of the user togenerate the image feature map 300. In some implementations, thepre-trained image detector 102 is configured to break down, parse, orotherwise segment the images 106 into a plurality of portions. In someimplementations, the pre-trained image detector 102 is configured tosegment the images 106 into a plurality of tiles 302. Each tile 302corresponds to a particular portion, section, or region of a respectiveimage 106. In some instances, the tiles 302 may have a predeterminedsize or resolution. For instance, the tiles 302 may have a resolution of512 pixels×512 pixels (though the tiles 302 may have different sizes orresolutions). The tiles 302 may each be the same size, or some tiles 302may have a different size than other tiles 302. In some embodiments, thetiles 302 may include a main portion 306 (e.g., located at or towardsthe middle of the tile 302) and an overlapping portion 308 (e.g.,located along the perimeter of the tile 302). The main portion 306 ofeach tile 302 may be unique to each respective tile 302. The overlappingportion 308 may be a common portion shared with one or more neighboringtiles 302. The overlapping portion 308 may be used by the pre-trainedimage detector 102 for context in extracting features (e.g., tooth size,tooth shape, tooth location, tooth orientation, crown size, crown shape,gingiva location, gingiva shape or contours, tooth-to-gingiva interfacelocation, interproximal region location, and so forth) from the mainportion of the tile 302.

The pre-trained image detector 102 is configured to determine, identify,or otherwise extract one or more features from the tiles 302. In someimplementations, the pre-trained image detector 102 includes an imageclassifier neural network 304 (also referred to herein as an imageclassifier 304). The image classifier 304 may be implemented using aneural network similar to the neural network 400 shown in FIG. 4 andsubsequently described. For instance, the image classifier 304 mayinclude an input layer (e.g., configured to receive the tiles 302), oneor more hidden layers including various pre-trained weights (e.g.,corresponding to probabilities of particular classifications for tiles302), and an output layer. Each of these layers are described below. Theimage classifier neural network 304 of the pre-trained image detector102 is configured to classify each of the tiles 302. The pre-trainedimage detector 102 may be implemented using various architectures,libraries, or other combination of software and hardware, such as theMobileNet architecture, though other architectures may be used (e.g.,based on balances between memory requirements, processing speeds, andperformance). The pre-trained image detector 102 is configured toprocess each of the tiles 302 (e.g., piecewise) and stitch together thetiles 302 to generate the image feature map 300. Each classification fora respective tile 302 may correspond to an associated feature within thetile 302. Various examples of classifications include, for instance, aclassification of a tooth (e.g., incisors or centrals, canines,premolars or bicuspids, molars, etc.) included in a tile 302, a portionof the tooth included in the tile 302 (e.g., crown, root), whether thegingiva is included in the tile 302, etc. Such classifications may eachinclude corresponding features which are likely to be present in thetile. For instance, if a tile 302 includes a portion of a tooth and aportion of the gingiva, the tile 302 likely includes a tooth-to-gingivainterface. As another example, if a tile 302 includes a molar whichshows the crown, the tile 302 likely includes a crown shape, crown size,etc.

In some implementations, the pre-trained image detector 102 isconfigured to classify each of the tiles 302. For instance, the outputfrom the image classifier 304 may be a classification (or probability ofa classification) of the corresponding tile 302 (e.g., provided as aninput to the image classifier 304). In such implementations, the imagefeature map 300 may include each of the tiles 302 with theircorresponding classifications. The pre-trained image detector 102 isconfigured to construct the image feature map 300 by stitching togethereach of the tiles 302 with each tile 302 including their respectiveclassification. In this regard, the pre-trained image detector 102 isconfigured to re-construct the images 106 by stitching together thetiles 302 to form the image feature map 300, with the image feature map300 including the tiles 302 and corresponding classifications. Thepre-trained image detector 102 is configured to provide the imagefeature map 300 as an input to a model generation engine 104. In someimplementations, the image feature map 300 generated by the pre-trainedimage detector 102 may be a compressed filed (e.g., zipped or otherformat). The pre-trained image detector 102 may be configured to formatthe image feature map 300 into a compressed file for transmission to themodel generation engine 104. The model generation engine 104 may beconfigured to parse the image feature map 300 for generating a pointcloud corresponding to the image(s) 106, as described in greater detailbelow.

The model generation system 100 is shown to include a model generationengine 104. The model generation engine 104 may be any device(s),component(s), application(s), element(s), script(s), circuit(s), orother combination of software and/or hardware designed or implemented togenerate a three-dimensional (3D) model of a user's dental arch from oneor more images 106 of the user's dentition. The model generation engine104 is configured to generate the 3D model using a plurality of images106 received by the pre-trained image detector 102 (e.g., from a mobiledevice of the user). The model generation engine 104 may include aprocessing circuit including one or more processors and memory. Thememory may store various instructions, routines, or other programs that,when executed by the processor(s), cause the processor(s) to performvarious tasks relating to the generation of a 3D model. In someimplementations, various subsets of processor(s), memory, instructions,routines, libraries, etc., may form an engine. Each engine may bededicated to performing particular tasks associated with the generationof a 3D model. Some engines may be combined with other engines.Additionally, some engines may be segmented into a plurality of engines.

The model generation engine 104 is shown to include a feature mapreading engine 116. The feature map reading engine 116 may be anydevice(s), component(s), application(s), element(s), script(s),circuit(s), or other combination of software and/or hardware designed orimplemented to read features from an image feature map 300. The featuremap reading engine 116 may be designed or implemented to format,re-format, or modify the image feature map 300 received from thepre-trained image detector 102 for use by other components of the modelgeneration engine 104. For instance, where the output from thepre-trained image detector 102 is a compressed file of the image featuremap 300, the feature map reading engine 116 is configured to decompressthe file such that the image feature map 300 may be used by othercomponents or elements of the model generation engine 104. In thisregard, the feature map reading engine 116 is configured to parse theoutput received from the pre-trained image detector 102. The feature mapreading engine 116 may parse the output to identify the tiles 302, theclassifications of the tiles 302, features corresponding to theclassifications of the tiles 302, etc. The feature map reading engine116 is configured to provide the image feature map 300 as an input to anLSTM encoder 108, as described in greater detail below.

Referring now to FIG. 1 and FIG. 4, the model generation engine 104 isshown to include an LSTM encoder 108 and LSTM decoder 114. Specifically,FIG. 4 is a block diagram of an implementation of a neural network 400which may implement various components, features, or aspects within theLSTM encoder 108 and/or LSTM decoder 114. The LSTM encoder 108 may beany device(s), component(s), application(s), element(s), script(s),circuit(s), or other combination of software and/or hardware designed orimplemented to compute a probability for each feature of the imagefeature map 300 using one or more weights. The LSTM decoder 114 may beany device(s), component(s), application(s), element(s), script(s),circuit(s), or other combination of software and/or hardware designed orimplemented to determine a difference between features from a pointcloud and corresponding probabilities of features of the image featuremap 300 (e.g., computed by the LSTM encoder 108). The LSTM encoder 108and LSTM decoder 114 may be communicably coupled to one another suchthat the outputs of one may be used as an input of the other. The LSTMencoder 108 and LSTM decoder 114 may function cooperatively to refinepoint clouds corresponding to the images 106, as described in greaterdetail below.

As shown in FIG. 4, the neural network 400 includes an input layer 402including a plurality of input nodes 402 a-402 c, a plurality of hiddenlayers 404 including a plurality of perception nodes 404 a-404 h, and anoutput layer 406 including an output node 408. The input layer 402 isconfigured to receive one or more inputs via the input nodes 402 a-402 c(e.g., the image feature map 300, data from the LSTM decoder 114, etc.).The hidden layer(s) 404 are connected to each of the input nodes 402a-402 c of the input layer 402. Each layer of the hidden layer(s) 404are configured to perform one or more computations based on datareceived from other nodes. For instance, a first perception node 404 ais configured to receive, as an input, data from each of the input nodes402 a-402 c, and compute an output by multiplying or otherwise providingweights to the input. As described in greater detail below, the weightsmay be adjusted at various times to tune the output (e.g., probabilitiesof certain features being included in the tiles 302). The computedoutput is then provided to the next hidden layer 404 (e.g., toperception nodes 404 e-404 h), which then compute a new output based onthe output from perception node 404 a as well as outputs from perceptionnodes 404 b-404 d. In the neural network implemented in the LSTM encoder108, for instance, the hidden layers 404 may be configured to computeprobabilities of certain features in the images 106 of the user'sdentition based on the image feature map 300 and data from the LSTMdecoder 114, as described in greater detail below. For instance, thehidden layers 404 may be configured to compute probabilities offeatures, such as tooth size, tooth shape, tooth location, toothorientation, crown size, crown shape, gingiva location, gingiva shape orcontours, tooth-to-gingiva interface location, interproximal regionlocation, and so forth. Together, such features describe, characterize,or otherwise define the user's dentition.

The LSTM encoder 108 is configured to compute a probability of eachpotential feature being present in the images 106. The LSTM encoder 108is configured to receive the image feature map 300 (e.g., from thepre-trained image detector 102 directly, or indirectly from the featuremap reading engine 116). The LSTM encoder 108 may be or include a neuralnetwork (e.g., similar to the neural network 400 depicted in FIG. 4)designed or implemented to compute a probability of the potentialfeatures in the images 106 using the image feature map 300. The LSTMencoder 108 may be configured to use data from the LSTM decoder 114 andthe image feature map 300 for computing a probability of the featureswithin the images 106. Each feature associated with an image (e.g., of auser's dentition) or a tile 302 for an image 106 may have acorresponding probability. The probability may be a probability orlikelihood of a particular feature being present within the image 106 ortile 302 (e.g., a probability of a particular tooth size, toothorientation, tooth-to-gingiva interface location, etc. within the image106 or tile 302). For instance, neurons of the neural network may betrained to detect and compute a probability for various potentialfeatures described above within an image 106. The neurons may be trainedusing a training set of images and/or tiles and labels corresponding toparticular features, using feedback from a user (e.g., validatingoutputs from the neural network), etc.

As an example, a lateral incisor may have several possible orientations.A neuron of the LSTM encoder 108 may be trained to compute probabilitiesof the orientation of the lateral incisor relative to a gingival line.The neuron may detect (e.g., based on features from the image featuremap 300) the lateral incisor having an orientation extending 45° fromthe gingival line along the labial side of the dental arch. The LSTMencoder 108 is configured to compute a probability of the lateralincisor having the orientation extending 45° from the gingival line. Asdescribed in greater detail below, during subsequent iterations, theneuron may have weights which are further trained to detect the lateralincisor having an orientation extending 60° from the gingival line alongthe labial side of the dental arch and compute the probability of thelateral incisor having the orientation extending 60° from the gingivalline. Through a plurality of iterations, the probabilities of theorientation of the lateral incisor are adjusted, modified, or otherwisetrained based on determined orientations and feedback from the LSTMdecoder 114. In this regard, the neurons of the LSTM encoder 108 haveweights which are tuned, adjusted, modified, or otherwise trained overtime to have both a long term memory (e.g., through training of the 45°orientation in the example above) and short term memory (e.g., throughtraining of the 60° orientation in the example above).

As such, the neurons are trained to detect that a tooth may havemultiple possible features (e.g., a tooth may have an orientation of 45°or 60°, or other orientations detected through other iterations). Suchimplementations and embodiments provide for a more accurate overall 3Dmodel which more closely matches the dentition of the user by providingan LSTM system which is optimized to remember information from previousiterations and incorporate that information as feedback for training theweights of the hidden layer 404 of the neural network, which in turngenerates the output (e.g., via the output layer 406), which is used bythe output engine 110 for generating the output (e.g., the 3D model). Insome implementations, the LSTM encoder 108 and LSTM decoder 114 may betrained with training sets (e.g., sample images). In otherimplementations, the LSTM encoder 108 and LSTM decoder 114 may betrained with images received from users (e.g., similar to images 106).In either implementation, the LSTM encoder 108 and LSTM decoder 114 maybe trained to detect a large set of potential features within images ofa user's dental arches (e.g., various orientation, size, etc. of teethwithin a user's dentition). Such implementations may provide for arobust LSTM system by which the LSTM encoder 108 can computeprobabilities of a given image containing certain features.

Referring back to FIG. 1, the LSTM encoder 108 is configured to generatean output of a plurality of probabilities of each feature based on theinput (e.g., the image feature map 300 and inputs from the LSTM decoder114 described in greater detail below) and weights from the neuralnetwork of the LSTM encoder 108. The output layer 406 of the neuralnetwork corresponding to the LSTM encoder 108 is configured to output atleast some of the probabilities computed by the hidden layer(s) 404. Theoutput layer 406 may be configured to output each of the probabilities,a subset of the probabilities (e.g., the highest probabilities, forinstance), etc. The output layer 406 is configured to transmit, send, orotherwise provide the probabilities to a write decoder 118.

The write decoder 118 may be any device(s), component(s),application(s), element(s), script(s), circuit(s), or other combinationof software and/or hardware designed or implemented to maintain a listof each of the computed probabilities by the LSTM encoder 108. The writedecoder 118 is configured to receive the output from the LSTM encoder108 (e.g., from the output layer 406 of the neural network correspondingto the LSTM encoder 108). In some implementations, the write decoder 118maintains the probabilities in a ledger, database, or other datastructure (e.g., within or external to the system 100). As probabilitiesare recomputed by the LSTM encoder 108 during subsequent iterationsusing updated weights, the write decoder 118 may update the datastructure to maintain a list or ledger of the computed probabilities ofeach feature within the images 106 for each iteration of the process.

The output engine 110 may be any device(s), component(s),application(s), element(s), script(s), circuit(s), or other combinationof software and/or hardware designed or implemented to generate a pointcloud 500. FIG. 5A-FIG. 5C are illustrations of an example point cloud500 overlaid on an upper dental arch 504A and a lower dental arch 504B,and a perspective view of the point cloud 500 for the upper and lowerdental arch aligned to one another, respectively. The point clouds 500shown in FIG. 5A-FIG. 5C are generated by the output engine 110. Theoutput engine 110 may be configured to generate the point cloud 500using the image(s) 106 received by the pre-trained image detector 102.As described in greater detail below, the output engine 110 may beconfigured to generate a point cloud 500 of a dental arch of the userusing probabilities of features within one or more of the images 106. Insome instances, the output engine 110 may be configured to generate apoint cloud 500 of a dental arch using probabilities of features withinone of the images 106. For instance, the output engine 110 may beconfigured to generate a point cloud 500 of an upper dental arch 504Ausing an image of an upper open view of the upper dental arch of theuser (e.g., such as the image shown in FIG. 2C). In some instances, theoutput engine 110 may be configured to generate a point cloud 500 of theupper dental arch 504A using two or more images (e.g., the images shownin FIG. 2B and FIG. 2C, the images shown in FIG. 2A-FIG. 2C, or furtherimages). In some instances, the output engine 110 may be configured togenerate a point cloud 500 of the lower dental arch 504B using one image(e.g., the image shown in FIG. 2A), a plurality of images (e.g., theimages shown in FIG. 2A-FIG. 2B, FIG. 2A-FIG. 2C), etc. The outputengine 110 may be configured to combine the point clouds 500 generatedfor the upper and lower dental arch 504A, 504B to generate a point cloud500, as shown in FIG. 5C, which corresponds to the mouth of the user.The output engine 110 may use each of the images 106 for aligning thepoint cloud of the upper and lower dental arch 504A, 504B.

The output engine 110 is configured to generate the point cloud 500based on data from the LSTM encoder 108 via the write decoder 118. Theoutput engine 110 is configured to parse the probabilities generated bythe LSTM encoder 108 to generate points 502 for a point cloud 500 whichcorrespond to features within the images 106. Using the previousexample, the LSTM encoder 108 may determine that the highest probabilityof an orientation of a lateral incisor is 45° from the gingival linealong the labial side. The output engine 110 may generate points 502 forthe point cloud 500 corresponding to a lateral incisor having anorientation of 45° from the gingival line along the labial side. Theoutput engine 110 is configured to generate points 502 in a 3D spacecorresponding to features having a highest probability as determined byLSTM encoder 108, where the points 502 are located along an exteriorsurface of the user's dentition. In some instances, the output engine110 may generate the points 502 at various locations within a 3D spacewhich align with the highest probability features of the image(s) 106.Each point 502 may be located in 3D space at a location which maps tolocations of features in the images. As such, the output engine 110 maybe configured to generate points 502 for the point cloud 500 which matchthe probability of features in the images 106 (e.g., such that thepoints 502 of the point cloud 500 substantially match a contour of theuser's dentition as determined based on the probabilities). The outputengine 110 is configured to provide the point cloud 500 to the pointcloud feature extractor 112.

The point cloud feature extractor 112 may be any device(s),component(s), application(s), element(s), script(s), circuit(s), orother combination of software and/or hardware designed or implemented todetermine one or more features within a point cloud 500. The point cloudfeature extractor 112 may be configured to compute, extract, orotherwise determine one or more features from the point cloud 500 togenerate an image feature map (e.g., similar to the image feature mapreceived by the LSTM encoder 108). The point cloud feature extractor 112may leverage one or more external architectures, libraries, or othersoftware for generating the image feature map from the point cloud 500.In some implementations, the point cloud feature extractor 112 mayleverage the PointNet architecture to extract feature vectors from thepoint cloud 500. In this regard, the images 106 are used (e.g., by thepre-trained image detector 102) for generating an image feature map 300,which is used (e.g., by the LSTM encoder 108 and output engine 110) togenerate a point cloud 500, which is in turn used (e.g., by the pointcloud feature extractor 112) to extract features. The point cloudfeature extractor 112 is configured to transmit, send, or otherwiseprovide the extracted features from the point cloud 500 to the LSTMdecoder 114.

The LSTM decoder 114 is configured to receive (e.g., as an input) theextracted features from the point cloud feature extractor 112 and theprobabilities of features computed by the LSTM encoder 108. The LSTMdecoder 114 is configured to compute, based on the extracted featuresand the probabilities, a difference between the output from the LSTMencoder 108 and the point cloud 500. In some implementations, the LSTMdecoder 114 is configured to compute a loss function using the extractedfeatures from the point cloud 500 and the corresponding probabilities ofeach feature from the image feature map 300. The LSTM decoder 114 may beconfigured to determine which features extracted from the point cloud500 correspond to features within the image feature map 300. The LSTMdecoder 114 may determine which features correspond to one another bycomparing each feature (e.g., extracted from the point cloud 500 andidentified in the image feature map 300) to determine which featuresmost closely match one another. The LSTM decoder 114 may determine whichfeatures correspond to one another based on coordinates for points ofthe point cloud 500 and associated location of tiles 302 in the imagefeature map 300 (e.g., the coordinates residing within one of the tiles302, particular regions of the 3D space in which the points correspondto specific tiles 302, and so forth).

Once two features are determined (e.g., by the LSTM decoder 114) tocorrespond to one another, the LSTM decoder 114 compares thecorresponding features to determine differences. For instance, where thefeature is determined to be an orientation of a specific tooth, the LSTMdecoder 114 is configured to compare the orientation of the feature fromthe image(s) 106 and the orientation from the point cloud 500. The LSTMdecoder 114 is configured to compare the orientations to determinewhether the feature represented in the point cloud 500 matches thefeature identified in the image(s) 106 (e.g., the same orientation). Insome implementations, the LSTM decoder 114 is configured to determinethe differences by computing a loss function (e.g., using points 502from the point cloud 500 and corresponding features from the imagefeature map 300). The loss function may be a computation of a distancebetween two points (e.g., a point 502 of the point cloud 500 andcorresponding features from the image feature map 300). As the value ofthe loss function increases, the point cloud 500 correspondingly is lessaccurate (e.g., because the points 502 of the point cloud 500 do notmatch the features of the image feature map 300). Correspondingly, asthe value of the loss function decreases, the point cloud 500 is moreaccurate (e.g., because the points 502 of the point cloud 500 moreclosely match the features of the image feature map 300). The LSTMdecoder 114 may provide the computed loss function, the differencesbetween the features, etc. to the LSTM encoder 108 (e.g., eitherdirectly or through the read decoder 120) so that the LSTM encoder 108adjusts, tunes, or otherwise modifies weights for computing theprobabilities based on feedback from the LSTM decoder 114. Inimplementations in which the LSTM decoder 114 is configured to providedata to the LSTM encoder 108 through the read decoder 120, the readdecoder 120 (e.g., similar to the write decoder 118) is configured toprocess the data from the LSTM decoder 114 to record the differences foradjustment of the weights for the LSTM encoder 108.

During subsequent iterations, the LSTM encoder 108 is configured tomodify, refine, tune, or otherwise adjust the weights for the neuralnetwork 400 based on the feedback from the LSTM decoder 114. The LSTMencoder 108 may then compute new probabilities for features in theimages 106, which is then used by the output engine 110 for generatingpoints for a point cloud 500. As such, the LSTM decoder 114 and LSTMencoder 108 cooperatively adjust the weights for forming the pointclouds 500 to more closely match the point cloud 500 to the featuresidentified in the images 106. In some implementations, the LSTM encoder108 and LSTM decoder 114 may perform a number of iterations. The numberof iterations may be a predetermined number of iterations (e.g., twoiterations, five iterations, 10 iterations, 50 iterations, 100iterations, 200 iterations, 500 iterations, 1,000 iterations, 2,000iterations, 5,000 iterations, 8,000 iterations, 10,000 iterations,100,000 iterations, etc.). In some implementations, the number ofiterations may change between models generated by the model generationsystem 100 (e.g., based on a user selection, based on feedback, based ona minimization or loss function or other algorithm, etc.). For instance,where the LSTM decoder 114 computes a loss function based on thedifference between the features from the point cloud 500 andprobabilities computed by the LSTM encoder 108, the number of iterationsmay be a variable number depending on the time for the loss function tosatisfy a threshold. Hence, the LSTM encoder 108 may iteratively adjustweights based on feedback from the LSTM decoder 114 until the computedvalues for the loss function satisfy a threshold (e.g., an average of0.05 mm, 0.1 mm, 0.15 mm, 0.2 mm, 0.25 mm, etc.). Following the finaliteration, the output engine 110 is configured to provide the finaliteration of the point cloud 500.

In some implementations, the output engine 110 is configured to mergethe point cloud 500 with another point cloud or digital model of theuser's dentition. For instance, the output engine 110 may be configuredto generate a merged model from a first digital model (e.g., the pointcloud 500) and a second digital model (e.g., a scan of a user'sdentition, a scan of a dental impression of the user's dentition, etc.).In some implementations, the output engine 110 is configured to mergethe point cloud 500 with another 3D model using at least some aspects asdescribed in U.S. patent application Ser. No. 16/548,712, filed Aug. 22,2019, the contents of which are incorporated herein by reference in itsentirety.

The point cloud 500 may be used to manufacture a dental aligner specificto the user and configured to reposition one or more teeth of the user.The output engine 110 may be configured to provide the point cloud 500to one or more external systems for generating the dental aligner. Forinstance, the output engine 110 may transmit the point cloud 500 to a 3Dprinter to print a positive mold using the point cloud. A material maybe thermoformed to the positive mold to form a shape of a dentalaligner, and the dental aligner may be cut from the positive model. Asanother example, the output engine 110 may transmit the point cloud 500to a 3D printer to directly print a dental aligner.

Referring now to FIG. 6, a diagram of a method 600 of generating athree-dimensional model from one or more two-dimensional images is shownaccording to an illustrative embodiment. The method 600 may beimplemented by one or more of the components described above withreference to FIG. 1-FIG. 5. As an overview, at step 602, a modelgeneration system 100 receives one or more images 106 of a mouth of auser. At step 604, the model generation system 100 generates a pointcloud 500 from the one or more images 106. At step 606, the modelgeneration system generates a three-dimensional (3D) model from thepoint cloud 500. At step 608, dental aligners are manufactured based onthe 3D model.

At step 602, a model generation system 100 receives one or more images106 of a mouth of a user. The images 106 may be captured by the user.The user may capture the images 106 of the user's mouth with a dentalappliance 200 positioned at least partially therein. In someimplementations, the user is instructed how to capture the images 106.The user may be instructed to take at least three images 106. The images106 may be similar to those shown in FIG. 2A-FIG. 2C. The user maycapture the image(s) 106 on their mobile device or any other devicehaving a camera. The user may upload, transmit, send, or otherwiseprovide the image(s) 106 to the model generation system 100 (e.g., to anemail or account associated with the model generation system 100, via aninternet-based portal, via a website, etc.). The model generation system100 receives the image(s) 106 (e.g., from the mobile device of theuser). The model generation system 100 uses the image(s) 106 forgenerating a 3D model of the user's mouth, as described in greaterdetail below.

At step 604, the model generation system 100 generates a point cloud 500from the one or more images. In some embodiments, the model generationsystem 100 generates the point cloud 500 based on data from the one ormore images 106 of the dental arch of the user (e.g., received at step602). The model generation system 100 may parse the images 106 togenerate image feature maps 300. The model generation system 100 maycompute probabilities of features of the image feature map 300. Themodel generation system 100 may generate a point cloud 500 using theprobabilities of the features of the image feature map 300. The modelgeneration system 100 may determine features of the point cloud 500. Themodel generation system 100 may determine differences between thefeatures of the point cloud and corresponding probabilities of thefeatures of the image feature map. The model generation system 100 maytrain weights for computing the probabilities. The model generationsystem 100 may iteratively refine the point cloud 500 until apredetermined condition is met. Various aspects in which the modelgeneration system 100 generates the point cloud 500 are described ingreater detail below with reference to FIG. 7.

At step 606, the model generation system 100 generates athree-dimensional (3D) model. The model generation system 100 generatesa 3D model of the mouth of the user (e.g., a 3D model of the upper andlower dental arch of the user). In some embodiments, the modelgeneration system 100 generates a first 3D model of an upper dental archof the user, and a second 3D model of a lower dental arch of the user.The model generation system 100 may generate the 3D models using thegenerated point cloud 500 (e.g., at step 604). In some embodiments, themodel generation system 100 generates the 3D model by converting a pointcloud 500 for the upper dental arch and a point cloud 500 for the lowerdental arch into a stereolithography (STL) file, with the STL file beingthe 3D model. In some embodiments, the model generation system 100 usesthe 3D model for generating a merged model. The model generation system100 may merge the 3D model generated based on the point cloud 500 (e.g.,at step 606) with another 3D model (e.g., with a 3D model generated byscanning the user's dentition, with a 3D model generated by scanning animpression of the user's dentition, with a 3D model generated byscanning a physical model of the user's dentition which is fabricatedbased on an impression of the user's dentition, etc.) to generate amerged (or composite) model.

At step 608, dental aligner(s) are manufactured based on the 3D model.In some embodiments, a manufacturing system manufactures the dentalaligner(s) based at least in part on the 3D model of the mouth of theuser. The manufacturing system manufactures the dental aligner(s) byreceiving the data corresponding to the 3D model generated by the modelgeneration system 100. The manufacturing system may manufacture thedental aligner(s) using the 3D model generated by the model generationsystem 100 (e.g., at step 608). The manufacturing system may manufacturethe dental aligner(s) by 3D printing a physical model based on the 3Dmodel, thermoforming a material to the physical model, and cutting thematerial to form a dental aligner from the physical model. Themanufacturing system may manufacture the dental aligner(s) by 3Dprinting a dental aligner using the 3D model. In any embodiment, thedental aligner(s) are specific to the user (e.g., interface with theuser's dentition) and are configured to reposition one or more teeth ofthe user.

Referring now to FIG. 7, a diagram of a method 700 of generating a pointcloud 500 from one or more two-dimensional images 106 is shown accordingto an illustrative embodiment. The method 700 may be implemented by oneor more of the components described above with reference to FIG. 1-FIG.5C. As an overview, at step 702, the model generation system 100generates an image feature map 300 using one or more images. At step704, the model generation system 100 computes a probability of eachfeature in the image feature map 300. At step 706, the model generationsystem 100 generates a point cloud 500. At step 708, the modelgeneration system 100 determines features of the point cloud 500. Atstep 710, the model generation system 100 determines differences betweenfeatures of the point cloud and features of the image feature map 300.At step 712, the model generation system 100 trains weights forcomputing probabilities. At step 714, the model generation system 100determines whether a predetermined condition is satisfied. Where thepredetermined condition is not satisfied, the method 700 loops back tostep 704. Where the predetermined condition is satisfied, at step 716,the model generation system 100 outputs a final iteration of the pointcloud.

At step 702, the model generation system 100 generates an image featuremap 300 from the one or more images 106. In some embodiments, apre-trained image detector 102 of the model generation system 100generates the image feature map 300 from the image(s) 106 (e.g.,received at step 602 of FIG. 6). The image feature map 300 may include aclassification of a plurality of portions of the image(s) 106. Eachclassification may correspond to a feature within the respective portionof the image(s) 106 to be represented in the point cloud.

In some embodiments the pre-trained image detector 102 may receive theimage(s) 106 of the mouth of the user. The pre-trained image detector102 portions the image(s) 106 received from the mobile device of theuser. The pre-trained image detector 102 may portion the image(s) 106into pre-determined sized portions. For instance, the pre-trained imagedetector 102 may portion the image(s) 106 into tiles 302. The tiles 302may be equally sized portions of the image(s) 106. A plurality of tiles302 corresponding to an image 106 may together form the image 106. Thepre-trained image detector 102 may determine a classification of each ofthe portions of the image(s) 106 (e.g., of each tile 302 correspondingto an image 106). The pre-trained image detector 102 may determine theclassification by parsing each portion of the image(s) 106. Thepre-trained image detector 102 may parse portions of the image(s) 106 byleveraging one or more architectures, such as the MobileNetarchitecture. In some implementations, the pre-trained image detector102 may include an image classifier 304, which may be embodied as aneural network. The image classifier 304 may include an input layer(e.g., configured to receive the tiles 302), one or more hidden layersincluding various pre-trained weights, and an output layer. The imageclassifier 304 may classify each of the tiles 302 based on thepre-trained weights. Each classification for a respective tile 302 maycorrespond to an associated feature. The pre-trained image detector 102may generate the image feature map 300 using the portions of theimage(s) 106 which include their respective classifications. Forinstance, following the tiles 302 being classified by the imageclassifier 304, the pre-trained image detector 102 may reconstruct theimage(s) 106 as an image feature map 300 (e.g., by stitching togetherthe tiles 302 to form the image feature map 300).

At step 704, the model generation system 100 computes a probability offeatures in the image feature map 300. In some embodiments, an LSTMencoder 108 of the model generation system 100 computes theprobabilities. The LSTM encoder 108 may compute a probability for eachfeature of the image feature map 300 using one or more weights. The LSTMencoder 108 receives the image feature map 300 (e.g., generated at step604). The LSTM encoder 108 parses the image feature map 300 to computeprobabilities of features present in the image feature map 300. The LSTMencoder 108 may be embodied as a neural network including one or morenodes having weights which are tuned to detect certain features in animage feature map 300. The output of the neural network may be aprobability of a corresponding feature in the image feature map. TheLSTM encoder 108 may be tuned to detect and compute a probability of thepotential features in the images 106 using the image feature map 300.

At step 706, the model generation system 100 generates a point cloud500. In some embodiments, an output engine 110 of the model generationsystem 100 may generate the point cloud 500 using the probabilities(e.g., computed at step 702). The output engine 110 generates the pointcloud 500 based on data from the LSTM encoder 108. The output engine 110may generate the point cloud 500 using the probabilities which arehighest. For instance, the output engine 110 may generate the pointcloud 500 by parsing the data corresponding to the probabilities foreach feature of the images 106. Each feature may include a correspondingprobability. The output engine 110 may identify the most probablefeatures of the images 106 (e.g., based on which probabilities arehighest). The output engine 110 may generate a point cloud 500 using themost probable features of the images 106. The point cloud 500 includes aplurality of points which together define a surface contour of a 3Dmodel. The surface contour may follow a surface of the user's dentalarch such that the point cloud 500 matches, mirrors, or otherwiserepresents the user's dental arch.

At step 708, the model generation system 100 determines features of thepoint cloud 500. In some embodiments, a point cloud feature extractor112 of the model generation system 100 determines one or more featuresfrom the point cloud 500 generated by the output engine 110 (e.g., atstep 706). The point cloud feature extractor 112 may process the pointcloud 500 to identify the features from the points of the point cloud500. The point cloud feature extractor 112 may process the point cloud500 independent of the probabilities computed by the LSTM encoder 108and/or the image feature map 300. In this regard, the point cloudfeature extractor 112 determines features from the point cloud 500without feedback from the LSTM encoder 108. The point cloud featureextractor 112 may leverage data from one or more architectures orlibraries, such as PointNet architecture, for determining features fromthe point cloud.

At step 710, the model generation system 100 determines differencesbetween features of the point cloud 500 (e.g., determined at step 708)and the features of the image feature map 300 (e.g., generated at step702). In some embodiments, an LSTM decoder 114 of the model generationsystem 100 determines a difference between the features determined bythe point cloud feature extractor 112 and corresponding features fromthe image feature map 300. The LSTM decoder 114 may compare featuresdetermined by the point cloud feature extractor 112 (e.g., based on thepoint cloud 500) and corresponding features from the image feature map300 (e.g., probabilities of features computed by the LSTM encoder 108).The LSTM decoder 114 may compare the features to determine how accuratethe point cloud 500 computed by the output engine 110 is in comparisonto the image feature map 300.

In some embodiments, the LSTM decoder 114 may compute a loss functionusing the features extracted from the point cloud 500 (e.g., by thepoint cloud feature extractor 112) and corresponding probabilities ofeach feature of the image feature map 300. The LSTM decoder 114 maydetermine the difference based on the loss function. The LSTM encoder108 may train the weights (described in greater detail below) tominimize the loss function computed by the LSTM decoder 114.

At step 712, the model generation system 100 trains weights forcomputing the probabilities (e.g., used at step 704). In someembodiments, the LSTM encoder 108 of the model generation system 100trains the one or more weights for computing the probability based onthe determined difference (e.g., determined at step 710). The LSTMencoder 108 may tune, adjust, modify, or otherwise train weights of theneural network used for computing the probabilities of the features ofthe image feature map 300. The LSTM encoder 108 may train the weightsusing feedback from the LSTM decoder 114. For instance, where the LSTMdecoder 114 computes a loss function of corresponding feature(s) of theimage feature map 300 and feature(s) extracted from the point cloud 500,the LSTM decoder 114 may provide the loss function value to the LSTMencoder 108. The LSTM encoder 108 may correspondingly train the weightsfor nodes of the neural network (e.g., for that particular feature)based on the feedback. The LSTM encoder 108 may train the weights of thenodes of the neural network to minimize the loss function or otherwiselimit differences between the features of the point cloud 500 andfeatures of the image feature map 300.

At step 714, the model generation system 100 determines whether apredetermined condition is met or satisfied. In some embodiments, thepredetermined condition may be a predetermined or pre-set number ofiterations in which steps 704-712 are to be repeated. The number ofiterations may be set by a user, operator, or manufacturer of the dentalaligners, may be trained based on an optimization function, etc. In someembodiments, the predetermined condition may be the loss functionsatisfying a threshold. For instance, the model generation system 100may repeat steps 704-712 until the loss function value computed by theLSTM decoder 114 satisfies a threshold (e.g., the loss function value isless than 0.1 mm). Where the model generation system 100 determines thepredetermined condition is not satisfied, the method 700 may loop backto step 704. Where the model generation system 100 determines thepredetermined condition is satisfied, the method 700 may proceed to step716.

At step 716, the model generation system 100 outputs the final iterationof the point cloud 500. In some embodiments, the output engine 110 ofthe model generation system 100 may output the point cloud 500. Theoutput engine 110 may output a point cloud 500 for an upper dental archof the user and a point cloud 500 for a lower dental arch of the user.Such point clouds 500 may be used for generating a 3D model, which inturn can be used for manufacturing dental aligners for an upper andlower dental arch of the user, as described above in FIG. 6.

As utilized herein, the terms “approximately,” “about,” “substantially,”and similar terms are intended to have a broad meaning in harmony withthe common and accepted usage by those of ordinary skill in the art towhich the subject matter of this disclosure pertains. It should beunderstood by those of skill in the art who review this disclosure thatthese terms are intended to allow a description of certain featuresdescribed and claimed without restricting the scope of these features tothe precise numerical ranges provided. Accordingly, these terms shouldbe interpreted as indicating that insubstantial or inconsequentialmodifications or alterations of the subject matter described and claimedare considered to be within the scope of the disclosure as recited inthe appended claims.

It should be noted that the term “exemplary” and variations thereof, asused herein to describe various embodiments, are intended to indicatethat such embodiments are possible examples, representations, orillustrations of possible embodiments (and such terms are not intendedto connote that such embodiments are necessarily extraordinary orsuperlative examples).

The term “coupled” and variations thereof, as used herein, means thejoining of two members directly or indirectly to one another. Suchjoining may be stationary (e.g., permanent or fixed) or moveable (e.g.,removable or releasable). Such joining may be achieved with the twomembers coupled directly to each other, with the two members coupled toeach other using a separate intervening member and any additionalintermediate members coupled with one another, or with the two memberscoupled to each other using an intervening member that is integrallyformed as a single unitary body with one of the two members. If“coupled” or variations thereof are modified by an additional term(e.g., directly coupled), the generic definition of “coupled” providedabove is modified by the plain language meaning of the additional term(e.g., “directly coupled” means the joining of two members without anyseparate intervening member), resulting in a narrower definition thanthe generic definition of “coupled” provided above. Such coupling may bemechanical, electrical, or fluidic.

The term “or,” as used herein, is used in its inclusive sense (and notin its exclusive sense) so that when used to connect a list of elements,the term “or” means one, some, or all of the elements in the list.Conjunctive language such as the phrase “at least one of X, Y, and Z,”unless specifically stated otherwise, is understood to convey that anelement may be X, Y, or Z; X and Y; X and Z; Y and Z; or X, Y, and Z(i.e., any combination of X, Y, and Z). Thus, such conjunctive languageis not generally intended to imply that certain embodiments require atleast one of X, at least one of Y, and at least one of Z to each bepresent, unless otherwise indicated.

References herein to the positions of elements (e.g., “top,” “bottom,”“above,” “below”) are merely used to describe the orientation of variouselements in the figures. It should be noted that the orientation ofvarious elements may differ according to other exemplary embodiments,and that such variations are intended to be encompassed by the presentdisclosure.

The hardware and data processing components used to implement thevarious processes, operations, illustrative logics, logical blocks,modules, and circuits described in connection with the embodimentsdisclosed herein may be implemented or performed with a general purposesingle- or multi-chip processor, a digital signal processor (DSP), anapplication specific integrated circuit (ASIC), a field programmablegate array (FPGA), or other programmable logic device, discrete gate ortransistor logic, discrete hardware components, or any combinationthereof designed to perform the functions described herein. A generalpurpose processor may be a microprocessor, or any conventionalprocessor, controller, microcontroller, or state machine. A processoralso may be implemented as a combination of computing devices, such as acombination of a DSP and a microprocessor, a plurality ofmicroprocessors, one or more microprocessors in conjunction with a DSPcore, or any other such configuration. In some embodiments, particularprocesses and methods may be performed by circuitry that is specific toa given function. The memory (e.g., memory, memory unit, storage device)may include one or more devices (e.g., RAM, ROM, flash memory, hard diskstorage) for storing data and/or computer code for completing orfacilitating the various processes, layers and circuits described in thepresent disclosure. The memory may be or include volatile memory ornon-volatile memory, and may include database components, object codecomponents, script components, or any other type of informationstructure for supporting the various activities and informationstructures described in the present disclosure. According to anexemplary embodiment, the memory is communicably connected to theprocessor via a processing circuit and includes computer code forexecuting (e.g., by the processing circuit or the processor) the one ormore processes described herein.

The present disclosure contemplates methods, systems, and programproducts on any machine-readable media for accomplishing variousoperations. The embodiments of the present disclosure may be implementedusing existing computer processors, or by a special purpose computerprocessor for an appropriate system, incorporated for this or anotherpurpose, or by a hardwired system. Embodiments within the scope of thepresent disclosure include program products comprising machine-readablemedia for carrying or having machine-executable instructions or datastructures stored thereon. Such machine-readable media can be anyavailable media that can be accessed by a general purpose or specialpurpose computer or other machine with a processor. By way of example,such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, orother optical disk storage, magnetic disk storage or other magneticstorage devices, or any other medium which can be used to carry or storedesired program code in the form of machine-executable instructions ordata structures and which can be accessed by a general purpose orspecial purpose computer or other machine with a processor. Combinationsof the above are also included within the scope of machine-readablemedia. Machine-executable instructions include, for example,instructions and data, which cause a general-purpose computer, specialpurpose computer, or special purpose processing machines to perform acertain function or group of functions.

Although the figures and description may illustrate a specific order ofmethod steps, the order of such steps may differ from what is depictedand described, unless specified differently above. Also, two or moresteps may be performed concurrently or with partial concurrence, unlessspecified differently above. Such variation may depend, for example, onthe software and hardware systems chosen and on designer choice. Allsuch variations are within the scope of the disclosure. Likewise,software implementations of the described methods could be accomplishedwith standard programming techniques with rule-based logic and otherlogic to accomplish the various connection steps, processing steps,comparison steps, and decision steps.

It is important to note that the construction and arrangement of thesystems and methods shown in the various exemplary embodiments areillustrative only. Additionally, any element disclosed in one embodimentmay be incorporated or utilized with any other embodiment disclosedherein.

What is claimed is:
 1. A method comprising: receiving, by a modelgeneration engine, one or more images of a dental arch of a user;generating, by the model generation engine, a point cloud based on datafrom the one or more images of the dental arch of the user, whereingenerating the point cloud comprises: identifying, based on the one ormore images, a plurality of features included in the one or more images;computing a probability for each feature; and generating the point cloudbased on the plurality of features using the probabilities; generating,by the model generation engine, a three-dimensional (3D) model of thedental arch of the user based on the point cloud.
 2. The method of claim1, further comprising manufacturing, based on the 3D model of the dentalarch of the user, a dental aligner specific to the user and configuredto reposition one or more teeth of the user.
 3. The method of claim 1,wherein the one or more images are received by the model generationengine from a mobile device corresponding to the user.
 4. The method ofclaim 1, wherein identifying the plurality of features in the one ormore images comprises generating an image feature map including theplurality of features included in the one or more images.
 5. The methodof claim 4, wherein the image feature map includes a plurality ofportions, and wherein each portion of the plurality of portions isclassified based on a respective feature within the portion of the oneor more images to be represented in the point cloud.
 6. The method ofclaim 1, further comprising: identifying, by a point cloud featureextractor, a plurality of features of the point cloud; determining, by adecoder, a difference between the plurality of features identified bythe point cloud feature extractor and corresponding features identifiedin the one or more images; and training, by an encoder, one or moreweights for computing the probability of features being included in theone or more images based on the determined difference.
 7. The method ofclaim 1, wherein the 3D model is a first 3D model, and wherein themethod further comprises generating a merged model by merging the first3D model with a second 3D model of the dental arch of the user.
 8. Themethod of claim 1, wherein the 3D model is a first 3D model, the methodfurther comprising comparing the first 3D model with a second 3D model.9. The method of claim 8, wherein the second 3D model is generated basedon a dental impression of the dental arch of the user.
 10. A methodcomprising: identifying, by an image detector based on one or moreimages of a dental arch of a user, one or more features represented in arespective portion of a plurality of portions of the one or more images;generating, by a model generation engine, a point cloud using the one ormore features identified in the one or more images and a respectiveprobability of the one or more features; and generating, by the modelgeneration engine based on the generated point cloud, athree-dimensional (3D) model of the dental arch of the user, the 3Dmodel corresponding to the one or more images.
 11. The method of claim10, further comprising manufacturing, based on the 3D model of thedental arch of the user, a dental aligner specific to the user andconfigured to reposition one or more teeth of the user.
 12. The methodof claim 10, wherein the one or more images are received from a mobiledevice corresponding to the user.
 13. The method of claim 10, whereinidentifying the one or more features in the one or more images comprisesgenerating an image feature map including the one or more featuresincluded in the one or more images.
 14. The method of claim 13, whereinthe image feature map is formed based on the plurality of portions, andwherein each portion of the plurality of portions is classified based ona respective feature within the portion of the one or more images to berepresented in the point cloud.
 15. The method of claim 10, furthercomprising: identifying, by a point cloud feature extractor, a pluralityof features of the point cloud; determining, by a decoder, a differencebetween the plurality of features identified by the point cloud featureextractor and corresponding features identified in the one or moreimages; and training, by an encoder, one or more weights for computing aprobability of features being included in the one or more images basedon the determined difference.
 16. The method of claim 10, wherein the 3Dmodel is a first 3D model, and wherein the method further comprisesgenerating a merged model by merging the first 3D model with a second 3Dmodel of the dental arch of the user.
 17. The method of claim 10,wherein the 3D model is a first 3D model, the method further comprisingcomparing the first 3D model with a second 3D model.
 18. The method ofclaim 17, wherein the second 3D model is generated based on a dentalimpression of the dental arch of the user.
 19. A system comprising: aprocessing circuit comprising a processor communicably coupled to anon-transitory computer readable medium, wherein the processor isconfigured to execute instructions stored on the non-transitory computerreadable medium to: identify, based on one or more images of a dentalarch of a user, a plurality of features represented in the one or moreimages; generate a point cloud using the one or more images and theplurality of features; and generate a three-dimensional (3D) model ofthe dental arch of the user based on the point cloud, wherein the 3Dmodel corresponds to the one or more images.
 20. The system of claim 19,wherein the 3D model is a first 3D model, and wherein the processor isfurther configured to compare the first 3D model with a second 3D modelgenerated based on a dental impression of the dental arch of the user.